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Abstract 

The problem of pricing Bermudan options using Monte Carlo and 
a nonparametric regression is considered. We derive optimal non- 
asymptotic bounds for a lower biased estimate based on the subop- 
timal stopping rule constructed using some estimates of continuation 
values. These estimates may be of different nature, they may be local 
or global, with the only requirement being that the deviations of these 
estimates from the true continuation values can be uniformly bounded 
in probability. As an illustration, we discuss a class of local polynomial 
estimates which, under some regularity conditions, yield continuation 
values estimates possessing this property. 

Keywords: Bermudan options, Nonparametric regression, Bound- 
ary condition, Suboptimal stopping rule 

1 Introduction 

An American option grants the holder the right to select the time at which 
to exercise the option, and in this differs from a European option which may 
be exercised only at a fixed date. A general class of American option pricing 
problems can be formulated through an M. d Markov process {X(t), < t < 
T} defined on a filtered probability space (O, (J r t)o<t<T,^ > )- It is assumed 
that X(t) is adapted to (J^)o<t<T in the sense that each X t is Tt measurable. 
Recall that each Tt is a cr-algebra of subsets of such that T s C Tt Q F 
for s < t. We interpret Tt as all relevant financial information available up 
to time t. We restrict attention to options admitting a finite set of exercise 
opportunities = to < ti < *2 < • • • < *L = T, sometimes called Bermudan 
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options. If exercised at time ti, I = 1, . . . ,L, the option pays fi(X(ti)), for 
some known functions /o, /i, • • • , /i mapping M. d into [0, oo). Let T n denote 
the set of stopping times taking values in {n, n + 1,.. . , L}. A standard 
result in the theory of contingent claims states that the equilibrium price 
V n (x) of the American option at time t n in state x given that the option was 
not exercised prior to t n is its value under an optimal exercise policy: 

V n (x) = sup E[f T (X(t T ))\X(t n ) =x], x £ R d . 
rer„ 

Pricing an American option thus reduces to solving an optimal stopping 
problem. Solving this optimal stopping problem and pricing an American 
option are straightforward in low dimensions. However, many problems 
arising in practice (see e.g. Glasserman (2004)) have high dimensions, and 
these applications have motivated the development of Monte Carlo meth- 
ods for pricing American option. Pricing American style derivatives with 
Monte Carlo is a challenging task because the determination of optimal ex- 
ercise strategies requires a backwards dynamic programming algorithm that 
appears to be incompatible with the forward nature of Monte Carlo sim- 
ulation. Much research was focused on the development of fast methods 
to compute approximations to the optimal exercise policy. Notable exam- 
ples include the functional optimization approach in Andersen (2000), mesh 
method of Broadie and Glasserman (1997), the regression-based approaches 
of Carriere (1996), Longstaff and Schwartz (2001), Tsitsiklis and Van Roy 
(1999) and Egloff (2005). A common feature of all above mentioned algo- 
rithms is that they deliver estimates Cq(x), . . . ,Cl-i{x) for the so called 
continuation values: 

(1.1) C k (x):=E[V k+1 (X(t k+1 ))\X(t k ) = x], fc = 0,...,L-l. 

An estimate for Vq, the price of the option at time to can then be defined as 

V Q (x) := max{/o(x), C (x)}, x G R d . 

This estimate basically inherits all properties of Cq(x). In particular, it is 
usually impossible to determine the sign of the bias of Vq since the bias of 
Co may change its sign. One way to get a lower bound (low biased estimate) 
for Vq is to construct a (generally suboptimal) stopping rule 

r = min{0 < k < L : C k (X{t k )) < f k (X(t k ))} 

with Cl = by definition. Simulating a new independent set of trajectories 
and averaging the pay-offs stopped according to f on these trajectories gives 
us a lower bound Vq for Vq. As was observed by practitioners, the so con- 
structed estimate Vq has rather stable behavior with respect to the estimates 
of continuation values Cq(x), . . . , Cl-i(x), that is even rather poor estimates 
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of continuation values may lead to a good estimate Vq . The aim of this paper 
is to find a theoretical explanation of this observation and to investigate the 
properties of Vq. In particular, we derive optimal non-asymptotic bounds for 
the bias Vq — E Vo assuming some uniform probabilistic bounds for C r — C r . 
It is shown that the bounds for Vq — E Vo are usually much tighter than ones 
for Vq — E Vq implying a better quality of Vo as compared to the quality 
of Vq constructed using one and the same set of estimates for continuation 
values. As an example, we consider the class of local polynomial estimators 
for continuation values and derive explicit convergence rates for Vq in this 
case. 

The issues of convergence for regression algorithms have been already 
studied in several papers. Clement, Lamberton and Protter (2002) were first 
who proved the convergence of the Longstaff-Schwartz algorithm. Glasser- 
man and Yu (2005) have shown that the number of Monte Carlo paths has 
to be in general exponential in the number of basis functions used for regres- 
sion in order to ensure convergence. Recently, Egloff, Kohler and Todorovic 
(2007) have derived the rates of convergence for continuation values esti- 
mates obtained by the so called dynamic look-ahead algorithm (see Egloff 
(2004)) that "interpolates" between Longstaff-Schwartz and Tsitsiklis-Roy 
algorithms. As was shown in these papers the convergence rates for Vq coin- 
cide with the rates of Cq and are determined by the smoothness properties 
of the true continuation values Cq, . . . , Cl-i- It turns out that the conver- 
gence rates for Vq depend not only on the smoothness of continuation values 
(as opposite to Vo), but also on the behavior of the underlying process near 
the exercise boundary. Interestingly enough, there are some cases where 
these rates become almost independent either of the smoothness properties 
of {Ck} or of the dimension of X and the bias of Vq decreases exponentially 
in the number of Monte Carlo paths used to construct {Ck}. 

The paper is organized as follows. In Section 2.1 we introduce and dis- 
cuss the so called boundary assumption which describes the behavior of the 
underlying process X near the exercise boundary and heavily influences the 
properties of Vq. In Section 2.2 we derive non- asymptotic bounds for the bias 
Vq — E Vq and prove that these bounds are optimal in the minimax sense. In 
Section 2.3 we consider the class of local polynomial estimates and propose 
a sequential algorithm based on the dynamic programming principle to es- 
timate all continuation values. Finally, under some regularity assumptions, 
we derive exponential bounds for the corresponding continuation values es- 
timates and consequently the bounds for the bias Vq — EVq. 
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2 Main results 



2.1 Boundary assumption 

For the considered Bermudan option let us introduce a continuation region 
C and an exercise (stopping) region £ : 

(2.2) C := {(i,x):f i (x)<C i (x)}, 
£ := {(i,x):fi(x)>Ci(x)}. 

Furthermore, let us assume that there exist constants Bq^ > 0, k = 0, . . . , L— 
1 and a > such that the inequality 

(2.3) P tfc , to (0 < \C k (X(t k )) - f k (X(t k ))\ <5)< B^ k 5 a , 5 > 0, 

holds for all k = 0, . . . , L — 1, where Pt k \t ls the conditional distribution 
of X{t k ) given X(to). Assumption (2.3) provides a useful characterization 
of the behavior of the continuation values {C k } and payoffs {f k } near the 
exercise boundary d£ . Although this assumption seems quite natural to look 
at, we make in this paper, to the best of our knowledge, a first attempt to 
investigate its influence on the convergence rates of lower bounds based on 
suboptimal stopping rules. We note that a similar condition, although much 
simpler, appears in the context of statistical classification problem (see, e.g. 
Mammen and Tsybakov (1999) and Audibert and Tsybakov (2007)). 

In the situation when all functions C k — f k , k = 0, . . . , L — 1 are smooth 
and have non-vanishing derivatives in the vicinity of the exercise boundary, 
we have a = 1. Other values of a are possible as well. We illustrate this by 
two simple examples. 



Example 1 Fix some a > and consider a two period (L = 1) Bermudan 
power put option with the payoffs 

(2.4) f (x)= f 1 (x) = {K 1 / a -x 1 / a ) + , xGR + , K > 0. 

Denote by A the length of the exercise period, i.e. A = ii — to- If the process 
X follows the Black-Scholes model with volatility a and zero interest rate, 
then one can show that 

C (x) := E[/i(X(ti))|X(t ) =x}= K l ' a ^{-d 2 ) 

_ x l/a e A(a-i-l)(,*/2«)$(_ dl ) 

with $ being the cumulative distribution function of the standard normal 
distribution, 

\og{x/K) + 0-^) a 2 A 

dl = 
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and d.2 = d\ — ay/A/ a. As can be easily seen, the function Cq{x) — fo(x) 
satisfies \Cq(x) — fo(x)\ x x x l a for x — ► +0 and Co (a;) > fo(x) for all x > 
if a > 1. Hence 

P(0 < |C (*(*o)) - /o (X(to))\ < S) < S a , 5^0, a>l. 

Taking different a in the definition of the payoffs (2.4), we get (2.3) satisfied 
for a ranging from 1 to oo. 




Figure 1: Illustration to Example 2. 

In fact, even the extreme case "a = oo" may take place as shown in the 
next example. 



Example 2 Let us consider again a two period Bermudan option such 
that the corresponding continuation value Cq{x) = E[/i(X(ti))|X(to) = x] 
is positive and monotone increasing function of x on any compact set in R. 
Fix some xq 6 R and choose 5o satisfying 5o < Co(xo). Define the payoff 
function fo(x) in the following way 




Co(x ) + <5o, x < x , 
C (x ) -Sq, x> x . 



So, fo(x) has a "digital" structure. Figure 1 shows the plots of Co and /o in 
the case where X follows the Black-Scholes model and fi(x) = (x — K) + . It 
is easy to see that 

P to (0 < \C (X(t )) - f (X(t ))\ < Sq) = 0. 
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On the other hand 



C = {x £ R : C (x) > /o(x)} = {i£l:i> x }, 
£ = {iGR: C (x) < /o(x)} = {xeR:x< x }. 

So, both continuation and exercise regions are not trivial in this case. 

The last example is of particular interest because as will be shown in 
the next sections the bias of Vq decreases in this case exponentially in the 
number of Monte Carlo paths used to estimate the continuation values, the 
lower bound Vq was constructed from. 

2.2 Non-asymptotic bounds for Vq — E Vq 

Let Ck,M, k = 1, . . . , L—l, be some estimates of continuation values obtained 
using M paths of the underlying process X starting from xq at time to- 
We may think of (XW{t),...,X( M \t)) as being a vector process on the 
product probability space with cr-algebra F® M and the product measure 
P® M defined on T® M via 

P^ M (A l x ... x A M ) = P^Ax) • . . . • P X0 (A M ), 

with A m G JF, m = 1,...,M. Thus, each Ck,Mi k = 0, ...,L — 1, is 
measurable with respect to J^® M . The following proposition provides non- 
asymptotic bounds for the bias Vq — E p ®m [Vo,m] given uniform probabilistic 

bounds for {Ck t u}- 

Proposition 2.1. Suppose that there exist constants B\, B2 and a positive 
sequence 7m such that for any 5 > So > it holds 

(2.5) P% M (\C KM {x) - C k (x)\ > 5 lM 1 ' 2 ) < S 1 exp(- J B 2< 5) 

for almost all x with respect to P* fc |t , the conditional distribution of X(tk) 
given X(to), k = 0, . . . , L — 1. Define 

(2.6) Vq, m := E [f fM (X(t fM ))\X(t ) = x ] 
with 

(2.7) T M := min {o < k < L : C h , M (X{t k )) < f k (X(t k ))} ■ 
If the boundary condition (2.3) is fulfilled, then 



< Vq — E p ®M [Vo,m ] — B 



L-l 



„-(l+a)/2 
<M 



.1=0 

with some constant B depending only on a, B\ and B^. 
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The above convergence rates can not be in general improved as shown 
in the next theorem. 

Proposition 2.2. Let L = 2. Fix a pair of non-zero payoff functions f±, fi 
such that f 2 : R d -> {0, 1} and < fi(x) < 1 on [0, l] d . Let V a be a class 
of pricing measures such that the boundary condition (2.3) is fulfilled with 
some a > 0. For any positive sequence 7m satisfying 

7m x = o(l), 7 M = O(M), M -► oo, 

i/iere exis£ a subset V a ,i of V a and a constant B > suca t/iat /or any 
M > 1, any stopping rule tm and any set of estimates {C^^m} measurable 
w.r.t. F® M } we have for some 5 > and k = 1,2, 

sup P^ M (\C k>M (x) - C k (x)\ > <5 7m 1/2 ) > 
for almost all x w.r.t. any P G "Pq, i7 and 
sup jsup Ep t0 [/ r (X(t r ))] - Ep« M [Ep t0 f ?M (X(t ?M ))}\ > B 1 



(l+a)/2 
M 



Finally, we discuss the case when "a = oo", meaning that there exists 
5o > such that 

(2.8) P tfc , to (0 < \C k {X(t k )) - f k {X{t k ))\ <5 ) = 

for k = 0, . . . , L — 1. This is very favorable situation for the pricing of the 
corresponding Bermudan option. It turns out that if the continuation values 
estimates {C k ^} satisfy a kind of exponential inequality and (2.8) holds, 
then the bias of Vo,Af converges to zero exponentially fast in 7m- 

Proposition 2.3. Suppose that for any 5 > there exist constants B\, B 2 
possibly depending on 5 and a sequence of positive numbers 7m not depending 
on 5 such that 

(2.9) P® M (\C kM (x) - C k {x)\ >5)<B l exp(- J B 27 M) 

for almost all x with respect to Pt k \t > k = 0, . . . , L — 1. Assume also that 
there exists a constant Bj > such that 



(2.10) E 



max fi(X(t k )) 

k=0,...,L 



< B f . 



If the condition (2.8) is fulfilled with some So > 0, then 
< V - E p ®m[F ,m] < 5 3 Lexp(--B 4 7M) 
with some constant B% and B± depending only on B\, B 2 and Bf. 
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Discussion Let us make a few remarks on the results of this section. First, 
Proposition 2.1 implies that the convergence rates of Vo t M, a Monte Carlo 
estimate for Vq : m, are always faster than the convergence rates of {Ck m} 
provided that a > 0. Indeed, while the convergence rates of {C^m} are of 
order T^ 2 , the bias of Vb,M converges to zero as fast as 7^/ 1+a ^ 2 - As to 
the variance of Vo,m> it can be made arbitrary small by averaging Vo,m over 
a large number of sets, each consisting of M trajectories, and by taking a 
large number of new independent Monte Carlo paths used to average the 
payoffs stopped according to tm- 

Second, if the condition (2.8) holds true, then the bias of Vo,M decreases 
exponentially in 7^, indicating that even very unprecise estimates of con- 
tinuation values would lead to the estimate Vo,m of acceptable quality. 

Finally, let us stress that the results obtained in this section are quite 
general and do not depend on the particular form of the estimates {Cfc,Af}> 
only the inequality (2.5) being crucial for the results to hold. This inequality 
holds for various types of estimators. These may be global least squares esti- 
mators, neural networks (see Kohler, Krzyzak and Todorovic (2009)) or local 
polynomial estimators. The latter type of estimators has not yet been well 
investigated (see, however, Belomestny et al. (2006) for some empirical re- 
sults) in the context of pricing Bermudan option and we are going to fill this 
gap. In the next sections we will show that if all continuation values {Ck} 
belong to the Holder class S(/3, H, R d ) and the conditional law of X satisfies 
some regularity assumptions, then local polynomial estimates of continua- 
tion values satisfy inequality (2.5) with -y M = M 2 ^/( 2 ^ +l/ ) +d ) log _1 (M) for 
some v > 0. 

Remark 2.4. In the case of projection estimates for continuation values, some 
nice bounds were recently derived in Van Roy (2009). Let {X k , k = 0, . . . , L] 
be an ergodic Markov chain with the invariant distribution tt and fo(x) = 
. . . = = f(x), then Cq = . . . = Cl-i(x) = C(x), provided that Xq is 

distributed according to tt. Furthermore, suppose that an estimate C(x) for 
the continuation value C(x) is available and satisfies a projected Bellman 
equation 

(2.11) d(x) = e-fUE n [maK{f(X 1 ),C{X 1 ))}\X =x], p>0, 
where II is the corresponding projection operator. Define 

V (x) :=-E[f f (X f )\X = x] 

with 

f := min {o < k < L : C{X k ) < f(X k )} , 
then as shown in Van Roy (2009) 

1 /2 

(2.12) E n \V (X ) - Vo(X )\ 2 < D [E n \C(X ) - UC(X )\ 2 ] 1/2 
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with some absolute constant D depending on p only. The inequality (2.12) 
indicates that the quantity 

E 7T \Vo(X )-V (X )\ 2 ] 1/2 

might be much smaller than sup,,, \C(x) — C(x)\ and hence qualitatively 
supports the same sentiment as in our paper. 

2.3 Local polynomial estimation 

We first introduce some notations related to local polynomial estimation. 
Fix some k such that < k < L and suppose that we want to estimate a 
regression function 

6 k (x) := E[g(X(t k+1 ))\X(t k ) = x], x £ R d 

with g : R d — > R. Consider M trajectories of the process X 

(X^(t ),...,X^(t L )), m = l,...,M, 

all starting from x , i.e. X^\to) = . . . = X^ M ^(t ) = xq. For some h > 0, 
x G R d , an integer Z > and a function K : R d — ► R + , denote by q Xj M a 
polynomial on R d of degree / (maximal order of the multi-index is less than 
or equal to /) which minimizes 



M 

(2.13) [ y(m) (^+i) " Qx,M(X (m) (tk) - x) 

m=l 



^ F ( x^)(t k )-x 



where Y {m \t) = g{X {m \t)). The local polynomial estimator 6 kM (x) of 
order I for the value k (x) of the regression function 9 k at point x is defined as 
Qk,M( x ) = 9x,Af (0) if <?z,M is the unique minimizer of (2.13) and 9 kt M(x) = 
otherwise. The value /i is called the bandwidth and the function K is called 
the kernel of the local polynomial estimator. 

Let tt u denote the coefficients of q x ,M indexed by the multi-index u £ 
N d , q x ,M(z) = ^2\ u \<i 7T uZ u • Introduce the vectors II = (7r u )| u |<; and S = 
{Su)\u\<l with 



s --w?^ Y {tk+l) [ — h — J 

m=l \ / 



K 



'x( m Ht k )~ 



X 



h 



Let Z(z) = {z u )\ u \<i be the vector of all monomials of order less than or 
equal to I and the matrix T = (r uijU2 )| ul | \ U2 \<i be defined as 

The following result is straightforward. 
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Proposition 2.5. If the matrix T is positive definite, then there exists a 
unique polynomial on M. d of degree I minimizing (2.13). Its vector of coeffi- 
cients is given by U = T~ 1 S and the corresponding local polynomial regres- 
sion function estimator has the form 

(2.15) 6 kM (x) = Z T (0)T- 1 S 

m=l \ / 

x zT«»r-' Z (^'"y-*) 

Remark 2.6. From the inspection of (2.15) it becomes clear that any local 
polynomial estimator can be represented as a weighted average of the "ob- 
servations" y( m ) ; m = 1, . . . , M, with a special weights structure. Hence, 
local polynomial estimators belong to the class of mesh estimators intro- 
duced by Broadie and Glasserman (1997) (see also Glasserman, 2004, Ch. 
8). Our results will show that this particular type of mesh estimators has 
nice convergence properties in the class of smooth continuation values. 

2.4 Estimation algorithm for the continuation values 

According to the dynamic programming principle, the optimal continuation 
values (1.1) satisfy the following backward recursion 

C L {x) = 0, 

C k (x) = E[m a x(f k+1 (X(t k+1 )),C k+1 (X(t k+1 )))\X(t k )=x], xeR d 

with k = 1, . . . , L — 1. Consider M paths of the process X, all starting 
from xq, and define estimates C\ t M, ■ ■ ■ ,Cl,m recursively in the following 
way. First, we put Cl,m{x) = 0. Further, if an estimate of C k+ i^M{x) is 
already constructed we define C ky M(x) as the local polynomial estimate of 
the function 

(2.16) C kM {x) := E[max(/ fe+1 (X(i fe+1 )), C k+1M {X(t k+l )))\X(t k ) = x], 
based on the sample 

(X^ m \t k ), C k+1M (X {m \t k+1 ))), m = 1, . . . , M. 

Note that all C^m are J-® M measurable random variables because the ex- 
pectation in (2.16) is taken with respect to a new cr-algebra T which is 
independent of J^® M (one can start with the enlarged product <7-algebra 
jr®{M+\) an( ^ ^ a k e expectation in (2.16) w.r.t. the first coordinate). The 
main problem arising by the convergence analysis of the estimate C k+ \^M 
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is that all errors coming from the previous estimates Cj t M, j < k have to 
be taken into account. This problem has been already encountered by Cle- 
ment, Lamberton and Protter (2002) who investigated the convergence of 
the Longstaff-Schwartz algorithm. 

2.5 Rates of convergence for V — EV 

Let (3 > 0. Denote by [f3\ the maximal integer that is strictly less than (3. 
For any x G M. d and any [f3\ times continuously differentiable real-valued 
function g on M. d , we denote by g x its Taylor polynomial of degree [(3\ at 
point x 

9x {x')= £ ^>-D°g( X ), 

\s\<W 

where s = (s\, . . . , s^) is a multi- index, |s| = s\ + . . . + Sd and D s denotes the 
differential operator D s = ^n"' + ^, d . Let H > 0. The class of (p,H,R d )- 

Holder smooth functions, denoted by H, M. d ), is defined as the set of 
functions g : R d — > R that are |_/5J times continuously differentiable and 
satisfy, for any x,x' G M d , the inequality 

<H\\ X -x'f , x' 

Let us make two assumptions on the process X 

(AXO) There exists a bounded set A C M d such that P(X(t ) G .A) = 1 
and P s | t (X(s) G *4) = 1 for all t and s satisfying to < t < s < T. 

(AX1) All transitional densities p(tk+i, y\tk, x), k = 0, ...,L — 1, of the 
process X are uniformly bounded on Ax A and belong to the Holder 
class £(/?, H, W 1 ) as functions of x G A, i.e. there exists j3 > 1 with 
— L/3J > and a constant such that the inequality 

\p(t k+l ,y\t k ,x') -p x (t k+1 ,y\t k ,x')\ < H\\x - x'\f 

holds for all x, x', y G A and k = 0, . . . , L — 1. 

Consider a matrix valued function f(s,x) = (r^^)!^^^^^ with ele- 
ments 

T UUU2 (s,x) := I z ul+U2 K(z)p(s,x + hz\t ,x )dz, 
JR d 

for any s > to- 

(AX2) We assume that the minimal eigenvalue of T satisfies 

min inf min 

k=i,...,LxeA ||W||=i 

with some v > and 70 > 0. 



W T T(t k ,x)W > 7o ^ 
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Moreover, we shall assume that the kernel K fulfils the following conditions 
(AK1) K integrates to 1 on R d and 

/ (1 + \\u\\^)K{u)du < oo, sup(l + \\u\\ W )K{u) < oo. 

JR d ueR d 

(AK2) K is in the linear span (the set of finite linear combinations) of 
functions k > satisfying the following property: the subgraph of k, 
{(s,u) : k(s) > u}, can be represented as a finite number of Boolean 
operations among the sets of the form {(s,u) : p(s,u) > /(«)}, where 
p is a polynomial on R d x R and / is an arbitrary real function. 

Discussion The assumption (AXO) may seem rather restrictive. In fact, 
as mentioned in Egloff, Kohler and Todorovic (2007), one can always use a 
kind of "killing" procedure to localize process X to a ball Br in R rf around xo 
of radius R . Indeed, one can replace process X(t) with the process 
killed at first exit time from Br. This new process X (t) is again a Markov 
process and is connected to the original process X(t) via the identity 

E[g(X )C (s))\X lc (t) = x} = E[g(X(s))M(s)\X(t) = x], s> t, 

that holds for any integrable g : M. d —> R with M(s) = 1(tr > s) and 
TR = inf{t > : X(t) B R }. This implies that 

(2.17) sup \E^o[f T ( X (t T ))] - E^o[/ T (^(t T ))]| 
reT 

< sup \E^[f T {X{t T ))l{m T >R)]\ 

with m t = sup 0<s<t ||-X"(s) — a?o II - T ne r - ri -s of (2.17) can be made arbitrary 
small by taking large values of R (the exact convergence rates depend, of 
course, on the properties of the process A"). 

Instead of "killing" the process X(t) upon leaving Br one can reflect it on 
the boundary of Br. As can be seen a new reflected process X n (t) satisfies 
(2.17) as well. 

Example Let process X(t) be a d-dimensional diffusion process satisfying 
X(t) = x + I n{X{t))db+ f <r{X(t))dW(t), t>t . 

J to J to 

Denote by p !C (s — t, y\x) the transition density of the process X K . Assume 
that a drift coefficient (i and a diffusion coefficient a are regular enough and 
a satisfies the so called uniform ellipticity condition on compacts, i. e. for 
each compact set K C R d 
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(AD1) //(•) G C£{K) and <r(-) G C 6 fe (K) for some natural fc > 1, 

(AD2) there is a K > such that for any £ G M d it holds 

d 

J] (<7(s)<7 T (aO) jfc && > a K U\\ 2 , xeK. 

j,k=l 

Then (see e.g. Friedman (1964)) for any fixed s > 0, p^(s, y\x) is a C k (Bn x 
i3^) function in (x,y). Moreover, as shown in Kim and Song (2007) (see 
also Bass (1997)) under assumptions (AD1) and (AD2) there exist positive 
constants Cj, i = 1, . . . , 4, such that 

C^ K {s,x,y)s- d / 2 e- c ^-yW 2 / s <p K (s,y\x) < C^ K {s,x,y) S - d l 2 e' ^^ ' s 
for all (s, x, y) G (0, T] x £>_r x where 

f 1A (*-ii*-*.iin ( 1a (*-ii»-*i 



Let us check now assumption (AX2) in the case when K(z) = T ^fi 2 ^ l{|| z ||<i}- 
We have for any fixed s > t and W G R D with D = d(d+ 1) ■ . . . ■ (d+ L/3J - 
1)/L/?J! 

VF T r(s,x)VF = f V I A-(z)^(s-t ,x + /iz|xo)dz 

7Rd Vh<i^j / 



W a z a \ (R-\\x + hz-x \\)dz 



with some positive constant B depending on s — to and R, and S(x, R) := 
{z : \\z\\ < 1, \\x + hz — xo\\ < R}. Introduce 

S(x, R) := {z : \\z\\ < 1, \\x + hz - x \\ <R- h/2}. 

Since S(x,R) C S(x,R) we get 



/ ( V W a zJ (R-\\x + hz-x \\)dz> £ f I 

JS ^ R ) \\ a \< W J 2 J§ ^ R ) Via 



£ W a z a \ dz. 

I<L/3J 

Using now the fact that the Lebesgue measure of the set S(x,R) is larger 
than some positive number A for all x G Br, where A depends on R and d 
but does not depend on h, we get 

2 



min inf 

k=l,...,L x&Br . 
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by the compactness argument. Thus, assumption (AX2) is fulfilled with 
u=l. 

Let us now reflect the diffusion process X{t) instead of "killing" it by 
defining a reflected process X n (t) which satisfies a reflected stochastic dif- 
ferential equation in Br, with oblique reflection at the boundary of Br in 
the conormal direction, i.e. 

X n (t)=x + ! fi{X n {t))dt+ j a(X n (t))dW(t) + I n(X n (t))dL(t), 

J to J to J to 

where n is the inward normal vector on the boundary of Br and L(t) 
is a local time process which increases only on {||x|| = R}, i.e. L(t) = 
J* t * l^ Xs \\=R} dL{s). Denote by p' R {s,y\x) a transition density of X^{t). It 
satisfies a parabolic partial differential equation with Neumann boundary 
conditions. Under (AD1) it belongs to C k (BR x Br) (see Sato and Ueto 
(1965)) for any fixed s > 0. Moreover, using a strong version of the max- 
imum principle (see, e.g. Friedman, 1964, Theorem 1 in Chapter 2) one 
can show that under assumption (AD2) the transition density p R '(s,y\x) is 
strictly positive on (0, T] x Br x Br. Similar calculations as before show 
that in this case 

> 7o > 

and hence assumption (AX2) holds with v = 0. 

Remark 2.7. It can be shown that (AK2) is fulfilled if K(x) = f{p{x)) for 
some polynomial p and a bounded real function / of bounded variation. 
Obviously, the standard Gaussian kernel falls into this category. Another 
example is the case where K is a pyramid or K = l^^d. 

In the sequel we will consider a truncated version of the local polyno- 
mial estimator Ck t M( x ) which is defined as follows. If the smallest eigen- 
value of the matrix T defined in (2.14) is greater than h v (log M)~ l we set 
T[Ck : M](x) to be equal to the projection of Ck : M(%) on the interval [0, C max ] 
with C max = maxfc = o,... ) L-i sup^g^ Cfc(x) (C max is finite due to (AX0) and 
(AX1)). Otherwise, we put T[Ck,u\{x) = 0. The following propositions 
provide exponential bounds for the truncated estimator {T[Ck,M}}- 

Proposition 2.8. Let condition (AX0)-(AX2),(AK1) and (AK2) be satis- 
fied and let {T[Ck t M}} be the continuation values estimates constructed as 
described in Section 2.4 using truncated local polynomial estimators of degree 
1/3 \ . Then there exist positive constants B\ , B2 and B% such that for any h 
satisfying B\h^ < \f \ log h\/Mh d and any ( > (q with some Co > it holds 



min inf 

/.• 1 / j 



W T T(t k ,x)W 
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for k = 1, . . . , L — 1. As a consequence, we get with h = M 1 /( 2 (f 3 + u )+ d ) and 
any ( > Co > 

P°" (^sup |T[a fc , M ](x) - C*(x)| > J^l^ j < ^exp(-5 3 C). 

Proposition 2.9. Lei condition (AX0)-(AX2),(AK1) and (AK2) be satis- 
fied, then for any 5 > there exist positive constants and B§ such that 



Pf M (sup \T[d kM ](x) - C k (x)\ >s]< B 4 exp(-B 5 M) 
for k = 1, . . . , L — 1. 

Remark 2.10. As can be seen from the proof of Proposition 2.8 and Re- 
mark 6.2 (note that uo in (6.25) grows linearly in d ) the constant B3 de- 
creases with the dimension d as fast as 1/d. The constant B§ is of order 

5 { t v)/P /d. 

Combining Proposition 2.1 with Proposition 2.8 and Proposition 2.9 
leads to the following 

Theorem 2.11. Let conditions (AX0)-(AX2), (AK1) and (AK2) be satis- 
fied. Define 

V ,m ■= E(f ?M (X(t fM ))\X(t ) = x ), 

with 

T M := min{0 < k < L : T[C kM ]{X(t k )) < f k (X(t k ))}, 

where {T[C k: M]} are continuation values estimates constructed using trun- 
cated local polynomial estimators of degree [(3\ . If the boundary condition 
(2.3) is fulfilled for some a > 0, then 

0<V - E p ®m[V ,m] < Dl M-^ 1+a VMP + ^ log( 1+a )/ 2 (M), 

XQ 

with some constant D\. On the other hand, if the condition (2.8) is satisfied 
with some 5$ > 0, then the bias of Vo,m decreases exponentially in M, i.e. 
there exist positive constants D2 and D3, such that 

0<V - E p0 m[V OiM ] < D 2 exp(-L> 3 M)- 

XQ 



15 



Discussion As we can see, the rates of convergence for {C^m} are of order 

M -fi/W+v)+d) log i/2 M 

which can be proved to be optimal under assumption (AX2), up to a loga- 
rithmic factor, for the class of Holder smooth continuation values {C k (x)}. 

On the other hand, the rates of convergence for E p ®m [Vq m] are of order 

x o ' 

M -P(l+ a )/(2(fi+u)+d) log (l+«)/2 ( M ) 

and are always faster than ones of {C ky M} provided that a > 0. The most 
interesting behavior of the lower bound Vo,M can be observed if the con- 
dition (2.8) is fulfilled. In this case the bias of Vo,m becomes as small as 
exp(— D3M). This means that even in the class of continuation values with 
an arbitrary low (but positive) Holder smoothness (e.g. in the class of non- 
differentiable continuation values) and therefore with an arbitrary slow con- 
vergence rates of the estimates {Ck t u}, the bias of the lower bound Vo,M 
converges exponentially fast to zero. 



3 Numerical example: Bermudan max call 

This is a benchmark example studied in Broadie and Glasserman (1997) and 
Glasserman (2004) among others. Specifically, the model with d identically 
distributed assets is considered, where each underlying has dividend yield 5. 
The risk-neutral dynamic of assets is given by 

^M = {r _ S )dt + adW k (t), k = l,...,d, 

where W k (t), k = l,...,d, are independent one-dimensional Brownian mo- 
tions and r, 5, a are constants. At any time t 6 {to, ti,} the holder of the 
option may exercise it and receive the payoff 

f(X(t)) = (max(Xi(t),..,X d (t)) - k)+. 

We take d = 2, r = 5%, 5 = 10%, a = 0.2, k = 100 and U = iT/L, i = 
0, ...,L, with T = 3, L = 9 as in Glasserman (2004, Chapter 8). First, we 
estimate all continuation values using the dynamic programming algorithm 
and the so called Nadaraya- Watson regression estimator 



(3.18) C kM {x) 



Z^K((x-X(™)(t k ))/h)Y^ 
" E^=iK((x-X^)(t k ))/h) 



k+l 



with^ = m a x(f k+1 (X( m \t k+1 )),e- rT / L d k+1 MX (m) (t k+1 ))), k = 0,... ,L- 
1. Here K is a kernel, h > is a bandwidth and (X^ih), . . . ,X( m )(t L )), 
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m = 1, . . . , M, is a set of paths of the process X, all starting from the point 
xq = (90, 90) at to = 0. As can be easily seen the estimator (3.18) is a local 
polynomial estimator of degree 0. Upon estimating C^m, we define a first 
estimate for the price of the option at time to = as 

M 

V := — YY} m) . 

m=l 

Next, using the previously constructed estimates of continuation values, we 
pathwise compute a stopping policy r via 

?W := min {l < k < L : C kM (X^(t k )) < f k (X^(t k ))] , n = l,...,N, 

where {X^ih), . . . , X^{t L )), n = l,...,N, is a new independent set of 
trajectories of the process X, all starting from xq = (90, 90) at to = 0. The 
stopping policy r yields a lower bound 

1 N 

^0 = ^E e_rtf(n) /r(«)(^ (n) (*f(«)))- 
n=l 

In Figure 2 we show the boxplots of Vo and Vo based on 100 sets of tra- 
jectories each of the size M = 4000 (N = 4000) for different values of 
the bandwidth h, where the triangle kernel K(x) = (1 — ||x|| 2 ) + is used to 
construct (3.18). The true value Vo of the option (computed using a two- 
dimensional binomial lattice) is 8.08 in this case. Several observations can 
be made by an examination of Figure 2. First, while the bias of Vq is always 
smaller then the bias of Vo, the largest difference takes place for large h. 
This can be explained by the fact that for large h more observations 
with X( m \t r ) lying far away from the given point x become involved in the 
construction of C t ^m{x)- This has a consequence of increasing the bias of 
the estimate (3.18) and Vq quickly deteriorates with increasing h . The most 
interesting phenomenon is, however, the behavior of Vo which turns out to 
be quite stable with respect to h. So, in the case of rather poor estimates of 
continuation values (when h is increases) Vo looks very reasonable and even 
becomes closer to the true price. 

We stress that the aim of this example is not to show the strength of 
the local polynomial estimation algorithms (although the performance of Vo 
for h = 120 is quite comparable to the performance of a linear regression 
algorithm reported in Glasserman (2004)) but rather to illustrate the main 
message of this paper, namely the message about the efficiency of Vo as 
compared to the estimates based on the direct use of continuation values 
estimates. 
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Figure 2: Boxplots of the estimates Vq (0) and Vq (1) for different values of 
the bandwidth h. 
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4 Conclusion 



In this paper we derive optimal rates of convergence for low biased estimates 
for the price of a Bermudan option based on suboptimal exercise policies 
obtained from some estimates of the optimal continuation values. We have 
shown that these rates are usually much faster than the convergence rates 
of the corresponding continuation values estimates. This may explain the 
efficiency of these lower bounds observed in practice. Moreover, it turns 
out that there are some cases where the expected values of the lower bounds 
based on suboptimal stopping rules achieve very fast convergence rates which 
are exponential in the number of paths used to estimate the corresponding 
continuation values. 

5 Proofs 

5.1 Proof of Proposition 2.1 

Define 

Tj := min{j < k < L : C k (X(t k )) < f k {X(t k ))}, j = 0,...,L, 
T hM := min{j < k < L : C k (X{t k )) < f k {X(t k ))}, j = 0,...,L 

and 

V k , M {x) := E[f fk M (X(t fk M ))\X(t k ) = x], x £ R d . 

The so called Snell envelope process V k is related to T k via 

V k (x) = V[f Tk (X(t Tk ))\X(t k ) =x], x G R d . 

The following lemma provides a useful inequality which will be repeatedly 
used in our analysis. 

Lemma 5.1. For any k = 0, . . . , L — 1, it holds with probability one 
(5.19) < V k (X{t k )) - V kM {X(t k )) 



L-l 



Y^iMxm-Qixm 

=k 

x i}{n,M>l,n=l} + 1 {n,M=l,n>l}) 
Proof. We shall use induction to prove (5.19). For k = L — 1 we have 

K L _i(X(t L _i)) - V L - lt M(X(t L -i)) = 

(f L -l(X(t L -i)) - fL(X(t L )))l{ TL _ 1=L _ l!?L _ hM=L y 

Uh{x{t L )) - /L_i(x(t L _i)))i {Ti _ 1=L)?i _ ljM=L _ 1} 

= \fL-i(X(t L -i)) - C L _i(X(t L _i))|l {fi _ ljM ^ Ti _ l} 
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since events {tl-i = L] and \jl-i,m = L} are measurable w.r.t. J 7 t L _ 1 - 
Thus, (5.19) holds with k = L-l. Suppose that (5.19) holds with k = L' + l. 
Let us prove it for k = L'. Consider a decomposition 



f TL ,{X{t TLl ))-h L , M {X{t 9ilM )) = s 1 + s 2 + s 3 



L ,M ' 



with 

Si 

s 2 
s 3 

Since 

[S 2 ] 

and 



[fr Ll (X(t TL ,)) - fr L , M {X(t ?Ll M ))^ l{r L ,>L' ,r L , M >L'} 
{fr Ll (X(t TL ,)) - fr LltM (X(t ?LlM ))^ 1{t l ,>L>, r L , m =L'} 
[fr Ll (X(t TL ,)) - f? L , M {X{t ?L , M ))} 1{t lI =L>,t l , m >L>}- 

E^V [{V v+1 {X{t v+1 ))-V L '+l,M{X{tL'+l)))] 1{t lI >L',t l , m >L'}, 
(E^' [f TLl+1 (X{t TLl + 1 ))]-f LI {X(t LI ))) l {TLl>V ,r L , M =V } 
{C V {X{t V )) ~ fv(X{t V ))) 1 {Tl ,>V,t l , M =L'} 



fT L l + hM ( X (tT L , + hM )) j 1{T L ,=L>,T L , M >L'} 

( //,! X(t L ,)) - C L ,{X(t L ,))) 1 {Tl ,=V, ? L , : M >U} 

(V L '+l(X(t L > +l )) - V L '+l,M(X(t L > +1 ))) 1{t l , =L ',t lI m >l> 

we get with probability one 

V v (X{t L ,)) - V VM {X(t v ) < \fv(X{t v )) - C v {X{t v ))\ 

X {}{?l>,m>V,t l ,=L'} + 1{? L , M =L',T LI >L>}) 

+ E^V [V u+1 {X{t u+1 ))-V L '+l,M(X(tL'+l))] 

Our induction assumption implies now that 



V v {X{t v ))-V v , M {X(t v )) < 

'L-l 

\M X l) - °l( X l)\ ( 1 {5j, M >i,7j=i} + 1 {n, M =l,n>l} 



i=v 



and hence (5.19) holds for k = L'. 



□ 



20 



Let us continue with the proof of Proposition 2.1. Consider the sets 
Si, Aij C R d , I = 0, . . . , L - 1, j = 1, 2, . . . , defined as 



S == G ^ d ■ C ltM {x) < ft(x), d{x) > /,(*)} 
U [x e M d : Q, M (x) > /,(a;), < /,(*)} 

Ao == {xGM d :0<|Q(x)-/ ; (x)|<7 M 1/2 }, 
Aj := {xGM rf :2^ 1 7M 1/2 <|Q(x)-/Kx)|<2^ 7M 1/2 }, j > 0. 
We may write 



Vb(X(t )) - Vb, M Wo)) < E^o 



X-l 



z=o 



j=0 

L-l 



< 



x;i/»(^(*i))-Ci(^(*i))ii{x (tl )6£,} 

L-l 

z=o 

7m /2 E P * ( |to (0 < |C,(X(t,)) - < 7m /2 ) 

L-l 



Z=0 



z=o 



Using the fact that 

|/,(X(t,)) - C,(X(t,))| < - * = 0,...,L-1, 

on £/ , we get for any j > 1 and / > 

E-^'o E t 



x 



< 2 j jJ- /2 E^o E 



x 



l {\C ltM {X{t l )-Ci{X{t l ))\>2i-^- M 1 ' 2 } 



Xl 



{0<|/ i (X(t i ))-C i (X(t i ))l<2^7 M 1/2 } 



< 



2^ 7 - 1/2 e^o P^dQMm)) - Q(x( tl ))\ > y- 1 ^ 



xl {o<|/ ! (x(t ! ))-Q(x(t ! ))l<2^ M 1/2 } 
< Btf-y-^exp (-B^- 1 ) P t( , to (0 < |/,(X(t,)) - Ci(*(f|))l < ^m'*) 

< B lBo ^ 1+a hli 1+a) ' 2 exp (-B^- 1 ) , 
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where Assumption 2.3 is used to get the last inequality. Finally, we get 



V (X(to)) - E p ®m \y , M (X(to))] 



< 



L-l 



7m 



x 

(l+a)/2 



+ 5' 



L-l 



OJ 



7m 



(l+a)/2 



2 i(1+Q ) exp(-5 2 2 J 



< B 



L-l 



«=0 



7m 



with some constant B depending on Bi, B 2 and a. 



5.2 Proof of Proposition 2.2 

We have 

(5.20) V (X(to)) - V , M (X(t )) = 

= E^o {(frixih)) - / 2 (x(i 2 )))i(n = i,fi iM = 2)] 

+ E*o [(/ 2 (X(i 2 )) - / 1 (X(t 1 )))l(r 1 = 2,f liM = i)] 



|/i(^(*i)) - Ci(^(ti))|l {?1 ^^ l} 



For an integer q > 1 consider a regular grid on [0, l] d defined as 

'2fci + i 2^ + r 



9 = 



2<7 



2<7 



: fcj G {0, . . . ,q- 1}, z = 1,... ,d > . 



Let n q (x) G g be the closest point to x G M d among points in G 9 . Consider 
the partition X[, . . . , X' d of [0, l] d canonically defined using the grid G q (x 
and y belong to the same subset if and only if n q {x) = n q {y)). Fix an integer 
m < q d . For any i € {1, ... , m}, define X* = X[ and X = R d \ |J™ 1 X h so 
that Xq, . . . ,X m form a partition of M d . Denote by B q j the ball with the 
center in n q (Xj) and radius l/2q. 

Define a hypercube 7i = {Pg- : a = (a±, . . . , a m ) G {—1, l} m } of probabil- 
ity distributions P s ofther.v. (X(ti), / 2 (X(t 2 ))) valued in R d x {0, 1} as fol- 
lows. For any G TC the marginal distribution of X(t\) (given X(to) = Xq) 
does not depend on a and has a bounded density /j, w.r.t. the Lebesgue 
measure on R d such that P^-^b) = and 



Pn(Xj) = Pp(B q ,j) = / /i(x) dx = u, j = l,..., 

JB a .i 



rn 



for some > 0. In order to ensure that the density ;U remains bounded we 
assume that q d u> = 0(1). 
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The distribution of ./^(X^)) given X(ti) is determined by the proba- 
bility P 9 (f 2 (X(t 2 )) = l|.X"(ti) = x) which is equal to Ci, ff (x). Define 

Ci,ff(ar) = fi(x) +<Tj(f>(x), xeXj, j = l,...,m, 



— 1/2 

and Ci t a(x) = fi(x) on X , where </>(x) = 7 M tp{q[x - n q (x)\), tp(x) = 
AtpO(\\x\\) with some constant A v > and with 6* : R + — > R + being a non- 
increasing infinitely differentiable function such that 0(x) = 1 on [0, 1/2] and 
0(ar) = on [l,oo). Furthermore, there exist two real numbers < /_ < 
/+ < 1 such that /_ < fi(x) < /+. Taking A v small enough, we can then 

ensure that < C\^{x) < 1 on W d . Obviously, it holds (f>(x) = A^^J^ 2 for 
x € B q j. As to the boundary assumption (2.3), we have 

P li (0<\f 1 (X(t 1 ))-C 1 , 9 (X(t 1 ))\<8) = 

m 

5>„(0 < \h(X( tl )) - d^Xih))] < 6,X(h) e B q j) 



j'=i 



m 

E / W)<4}M»)^=™l {M -V»< {} 



j=l JB 1,j 

and (2.3) holds provided that mco = 0(7 M "^ 2 ). Let tm be a stopping time 
measurable w.r.t. jF® M , then the identity (5.20) leads to 

e£°[/ t (X(t))] -E pfM [E^o f ?M (X(r M ))} 

t. r 

|A^(X(ti))|l{^ iM ^ n } 



= E 



o®M E p 

with A s (X(ti)) = - Ci^(X(ti)). By conditioning on we 

get 



^0 



E p0 MEj to [|A^(X(tx))|l 



= id ^ * Ep(giM Ep*° 



^(<i))l{n,M#n}l^(*i)eB,j 

= \™,7m /2 Eg' Pf M (n,M / rx). 



Using now a well known Birge's or Huber's lemma (see, e.g. Devroye, Gyorfi 
and Lugosi, 1996, p. 243), we get 



sup Pf M (n, M + ri) > 
o-G{-l;+l} m 



0.36 A 1 - 



MK H 



where 1^ := suppn g ^ if (-P, Q) and if (-P, Q) is a Kullback-Leibler distance 
between two measures P and Q. Since for any two measures P and Q from 
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H with Q / P it holds 

K(P,Q) < sup Eg° [c 1>aa (Xfc)) log j ^YTTTT I 

^i,s 2 e{-i ; +i} m M L lw,a 2 UM*i))J 

+(1 - C^Mbg ^^^ j 

< (i - /+ - - ^r 1 ^ [0 2 (x(t!))i wtl) ^ } ] 

for small enough A^, and log(|7^|) = mlog(2), we get 

sup {eJ!°[/ t ^(X(t))] - E p?M [E^o / w (X(t m ))]} > 

^m W7M 1/2 (l - AMj-'uj) > 7M (1+a)/2 , 

provided that mui > B^/jJ*^ 2 for some B > and AMuj < jm, where ^4 is 
a positive constant depending on /-,/+ and A^,. Using similar arguments, 
we derive 

sup Pf M (|C lj(f (x) - C liM (x)| > <5 7m 1/2 ) > 

a£{-l; + l} m 

for almost x w.r.t. P M , some <5 > and any estimator C\ t M measurable 
w.r.t. T® M . 

5.3 Proof of Proposition 2.3 

Using the arguments similar to ones in the proof of Proposition 2.1, we get 
(5.21) Vb Wo)) - E p «m [V , M (X(to))] < 

Xq 

L-l 
L-l 



1=0 



xl {x(t i )e£ ( } 1 {|c i (x(t i ))-/ i (x(t j ))|>5o}] 



with £7 defined as in the proof of Proposition 2.1. The first summand on 
the right-hand side of (5.21) is equal to zero due to (2.8). Hence, Cauchy- 
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Schwarz and Minkowski inequalities imply 

L-l 



V (X(t ))-E p ® M [V , M (X(to))} < £[E*o|E*« [f Tl+1 (X(t Tl+1 ))} -/,(X(t,))| 2 ] ' 

1=0 

x [e*» p® o M (\ Cl (x( tl )) - QMxmi > ^o)] V2 

< 2B 1 / 2 x; [e^o p®f acprft)) - a, M (x(t ; ))i > 

Now the application of (2.9) finishes the proof. 

5.4 Proof of Proposition 2.8 

Denote 

£k,M( x ) = T [Ck,M]( x ) ~ C k( x ) 

and 

(k,M(x) = C k ,M(x) ~ T[C k ,M]{x) 

for k = 1, . . . , L— 1. Using the elementary inequality | max(a, x)— max(a, y)| < 
|x — y\, which holds for any real numbers a, x and y, we get 

\£kM x )\ ^ \(k,M( x )\ + E i\ £ k+i,M(X{t k+1 ))\\ X(t k ) = x] 

and hence 

L-l 

(5.22) \e kM (x)\ < J2 V[\(lM X (tl))\\X(tk) = x] 

l=k+l 
L-l 

l=k+l 

Note that we take expectation in (5.22) with respect to a new a-algebra 
T which is independent of f® M and {Ci,m} are measurable w.r.t J^® M . 
Hence, random variables {£i,k,M} are J^® M measurable as well. According 
to Lemma 5.2 (see below) 



Pff Uiam{x) > 5y]\\ogh\/Mh*+A < 



^X 



sup|Ci,Af(y)| > sJ\\ogh\/Mh d + 2 A < D 2 exp{-D 3 5) 
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for almost all x w.r.t. Pt k \t - Thus, 



?T [HM X )\ > S \f\ log h\/Mh*+*A < LD 2 exp{-D 3 5/L) 



Analogously, using Lemma 5.3 one can prove that 

P? M (\£kM*)\ > 6 )< B 4 eM-B 5 Mh d+ ") 
with some positive constants B 4 and B 5 . 

Lemma 5.2. Let assumptions (AX0)-(AX2), (AK1) and (AK2) be fulfilled. 
Then there exist positive constants D±, D 2 and D3, such that for any h satis- 
fying D\hr < \J\ log h\/Mh d the estimates {T[C^m\\ based on the truncated 
local polynomials estimators of degree [(3\ fulfill 

Pf M (sup \T[C kM ](x) - C k (x)\ > 5J\\ogh\/Mh d +A < D 2 exp(-D 3 5), 
for all 5 > 5q and k = 1, . . . , L — 1. 

Lem ma 5.3. Let assu mptions (AX0)-(AX2), (AK1) and (AK2) be fulfilled 
and y/\ log h\/Mh d+2u = o(l) for M — > 00. Then there exist positive con- 
stants D4, D§ and Dq such that for any 5 > D±bP the inequality 

P% M (sup \T[C kM }(x) - C k (x)\ >6)< D 5 exp(-D 6 Mh d+ ») 
\xeA / 

holds for all k = 1, . . . , L — 1. 

Proof. We give the proof only for Lemma 5.2. Lemma 5.3 can be proved in 
a similar way. Fix some natural r > such that < r < L and consider the 
matrix T = (r U i, U2 )|ui|> 2 |<IAI witn elements 

' x( m \t r ) - x\ Ul+U2 ^ (X^ m \t r ) -x\ 



1 " "" • //,</ ' • ~ " ■ h 



U1 ' U2 Mh d 

m=l 



The smallest eigenvalue Ar of the matrix V satisfies 

A r = min W T TW 
\\w\\=i 

> min W T E[IW + min W T (T - E\T])W 
\\W\\=1 \\w\\=i 

(5.23) > mmW T E[T]W- £ \F Ul , U2 - E[F Ul , U2 ] 

l«i|,l« 2 |<L/3J 



By Assumption (AX2) 



inf min 

xeA ||W||=i 



W T E[T(x)]W >-f h u 
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with some 70 > 0. For m = 1, . . . , M, and any multi- indices u\, u 2 such that 
\ui\, 1 1*2 1 < L/3J > define 



WehaveE Ptr|to [A m (x)] = 0, 



U\+U2 



K 



'x( m \t r 



x 



- / z" 1+M2 K(z)p(t r ,x + /iz|t ,x )^. 



|A m (x)| </i^sup (l + ||z|| 2/3 )K(z) 
zeR d L 



=: K, hT 



and 



E P [A m (x)] 2 < / z 2u i +2u *K 2 (z)p(t r ,x + hz\t ,x )dz 

< Pm^ f {1 + \\ 4 ^ )K 2 {z)dz= . Kh ~d 
~ h d J Rd y 

where p max = sup 2eR d p(t r , z\to, xo) and Ki,K2 are two positive constants. 
Due to assumption (AK2), the class of functions 



x : ■ k 

h 



h 



: x£R d , h£R\{0}, |ni|,|n 2 | < L/?J 



is a bounded Vapnik-Cervonenkis class of measurable functions (see Dudley 
(1999)). According to Proposition 6.1 (see Appendix), we have for any £ > 

(5.24) P tr , t0 ( sup |r ui;U2 (x) - ET UUU2 (x)\ > () 

\x£A J 



P*,|*o (sup- 



M 



A m (x) 



m=l 



< L ex.p(-(B Mh d ) 



with some positive constants Lq and -Bo. Combining (5.23) and (2.17) with 
(5.24), we get 

P trlt0 f inf A r (x) < 70/172) < L N 2 exp(- l0 B Mh d+ »/2N 2 ), 

where TVj is the number of elements in the matrix T. Assume that M is large 
enough so that 70/2 > (log M)^ 1 . Then on the set {inf xe _4 Xr(x) > 70/1^/2} 
we have 



\T[C rM ](x) - C r (x)\ < \C rM {x) - C r (x)\, XGA 
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since sup xe _4 C r (x) < C max . Therefore, it holds for any ( > 



P trlt0 [ sup\T[C r , M ](x) - C r (x)\ >()< Pt r |t (J^M*) < 7o/»72 
+ P t , t0 f sup \C r , M {x) - C r (x)\ > (, inf A r (x) > 70/172 



Introduce the matrix Q = (Q m , u )i<m<M, \u\<[(3\ 
' X^^-xY 



with elements 



Qrn,u 



1 K (Xim)(t r )- X 



Mh d 



Denote by Q u the uth column of Q and define 

(«)/ 



I«I<L/3J 

Since T = Q T Q, we get Z T (0)r -1 Q T Q u = l{ u =(o,...,o)} f° r an y s w ^ n l s l — 
L/3J. Hence Z T (0)r -1 Q T Q c = C r {x). Thus, we can write 

C rM {x) - C r (x) = Z T (0)T- 1 (S - Q T Q C ) =: Z J (O)^ 1 e M (x) , 

where £m(x) is a vector valued function with components 



M 



m=l 



'X^-X 
h 



K 



'x<r>-z 
h 



and y r ( ™ } = max(/ r+1 (X( m )(t r+1 )) )T [a r+liM ](x( m )(t r+1 ))). So, on the set 
{inf^ A r (x) > joh"/2} we get 

\C rM (x) ~ C r (x)\ < \\Te M \\ < \t X \\ £ m\\ ^ 2 ^7o _1 |kM|| < 2h~ u ^ 1 N 1 / 2 max \e M . 



Denote 

Agio*) := ^ y r l S-a(x (m) («r)) 

All(x) := 
It holds 

|£M,«| < — > AI; 1 -;,, - 



1 

F 
1 



' v(m) \ u / v (m) s 

Xj- ~X\ K l^r -X 



h 



h 



C r {X^\t r )) - Cr^X^iU)) 



a; ' — x 



K 



'X^-X 
h 



M 



— V A^) 

m=l 



1 M 

— V [aw -ea( 2 ) 



m=l 



+ |EA(%|. 
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Note that E P 



*rl*0 



A (l) 



■ and 



\A£l(x)\<A n h- d , VarfAg^x) 
A<Zl(x) - E [A£l(x)] <A 21 h^ d , Var[A(%(x) 



< A 12 h- d , 
< A 22 h^~ d 



with some positive constants An, A\ 2 , A 2 \ and A 22 not depending on x. 
Proposition 6.1 implies that for any S > 5q > 



p t r |t 



1 M y \ 

— ^ AgL > 5y/\]ogh\/Mh*\ < Lxexp^xllog^l) 

m=l oo / 



with some 
tation 



positive constants Li and Furthermore, due to the represen- 

(z — x) u 

C \c^ u) (x + w(z - x)) - C^ u) (x)} (1 - w)^~ l dw 
Jo L J 



C r (z)-C r , x (z) = lf3\ 

\u\ = \j3\ 



we get for any two points x\ and x 2 in M. d 

na(-) - c r , xl (o - (c r (o - c r , X2 (-))iu < iki -^i 

Now it can be shown (see Dudley (1999)) that the class 



\P-\P\ 



C r {-) — C r ^ x {- t 



- x 



h 



K 



■ — x 



: x £ 



l d , h £l\{0}, |u| < L/3J 



is a 



bounded Vapnik-Cervonenkis class of measurable functions. Hence 

-fl [ A il- E P M *o A «l] >Sy/\\ogh\/MhA <L 2 exp(-6B 2 \\ogh\) 

m=l oo / 



~Pt r \t 

for 5 > So > and some positive constants L 2 and B 2 . Furthermore, using 
the inequality | Ep^ |tQ [A^L]| < A 3 h^, we arrive at 

; u {x)\ > l0 5^\\ogh\/{MhdN^ <L 3 exp(- ( 5 J B3|log/ i |) 
with some positive constants L3 



P* r |t sup \e M 



5^\logh\/Mh d . 



L 3 and B 3 , provided that 67^ N^ 2 A^ < 

□ 
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6 Appendix 



6.1 Some results from the theory of empirical processes 



Definition A class T of functions on a measurable space (X, X) is called 
a bounded Vapnik-Cervonenkis class of functions if there exist positive num- 
bers A and uo such that, for any probability measure P on (X, X) and any 
< p < 1 

(6.25) M(F,L 2 (P),p\\F\\ L2{P) )< (jj , 

where M(S, d, e) denotes the e-covering number of S in a metric d, and 
F := supj e jr | f\ is the envelope of T. The following proposition is a key tool 
for obtaining convergence rates for local type estimators. 

Proposition 6.1 (Talagrand (1994), Gine and Guillou (2001)). Let T be 

a measurable uniformly bounded VC class of functions, and let a and U be 
any numbers such that supj e:F Var(/) < a 2 , supj^-p ||/||oo < U and < 
a < U/2. Then, there exist a universal constant B and constants C and L, 
depending only on the VC characteristics A and uj of the class T , such that 



E 



sup 



M 



£(/(*m)"E /(*!)) 



m=l 



< B 



rr, AU I- L, 9, AU 

iou log h v u\ Ma z log 

(j V a 



If moreover \^Ma > C\U \/\og(U /a), there exist constants L and C which 
depend only on the VC characteristics of T , such that, for all A > C and t 
satisfying 



, . r— . U v Ma 1 

CVMa\ log — < t < A——. 
' a U 



P (-p|e(/(^) - «/™| > «) <- ^ (- los(1+ ^ /(4L)) ^) ■ 

Remark 6.2. It can be deduced from the proof of Proposition 6.1 in Gine 
and Guillou (2001) that constant L can be taken independent of u. The 
constant C (and hence A) in the case of large to can be chosen in the form 
C = ojCq for some constant Co not depending on u>. 
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